Checking Google Maps bicycling timing

Google maps bicycling directions are "in beta", i.e. they aren't very good, at least compared to driving directions.

I hope to improve one aspect of these directions - timing - by comparing ride timing from bike share data to the Google timing for the same endpoints.

Here are the bike share data sources I have found:

Data from google maps is probably best accessed through the distance matrix api:https://developers.google.com/maps/documentation/distance-matrix/ The big limitation of this is that I can only access data for 2500 trips per day without paying. With this limitation, it will take me a month or so just to get data for each system.

Improving Movie Wiki's algorithm

A friend of mine made an iPhone app that shows the connections via movies between actors, directors, and films One enters an actor, for example, and a list of the movies that actor has been in and the directors they have worked with, while entering a director will give the movies the director has directed and the actors they have worked with. The major issue is this: IMDB does not allow commercial use of their data, and this app is on the apple iTunes store. Instead, the data is based on looking at the Wikipedia pages of the actors, movies, and directors, and finding all the links to actors or directors on the page. This means that there are a large number of incorrect data points, most significantly false positives for connections.

Using the text of the Wikipedia page and possibly IMDB data (http://www.imdb.com/interfaces) for some supervision of learning, I hope to come up with an algorithm that gives a probability of a false positive for a connection.
In order to commercialize this product, I must be able to build an algorithm that doesn't directly use IMDB data, i.e. I can't deliver a predictor that directly trained on IMDB data. Instead, I hope to use IMDB data to learn what is necessary to write a basic algorithm that will be commercially acceptable.

The bad news about data access is that the API's for accessing Wikipedia data have limitations on access, so it will take about two weeks to get the text information.

Predicting Taxi drop offs based on demographic data.

Using demographic data, such as:

  • Census/American Community Survey (ACS) data:
    • Income, commute data, and much, much more.
    • Data from the census makes the best possible attempt to record every single resident of the US, but is mostly just counting population, with only a small amount of other info.
    • Data from ACS uses a random sample and gets diverse demographic information; though some of this information may have very high errors.
    • Lots of sources, I've used https://nhgis.org/, as it allows you to just download the columns you want and not waste hard drive space or time processing lots of data.
    • The census supplies "shapefiles" of the different geographical areas they use at https://www.census.gov/geo/maps-data/data/tiger-line.html. I divided mine up by census tract, so I got census tract shapefiles.
  • Google maps distance and time data.

I hope to be able to predict the number of taxi trip drop offs in a given area. Taxi trip data is available from NYC government websites: (http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml or (just includes new data) https://data.cityofnewyork.us/data?agency=Taxi+and+Limousine+Commission+%28TLC%29&cat=&type=new_view&browseSearch=&scope= )


In [ ]: